Publishing Data that Links Itself: A Conjecture

نویسندگان

  • Giovanni Tummarello
  • Renaud Delbru
چکیده

With the advent of RDFa and the at least partial support by major search engines, semantically structured data is more and more appearing on the Web. To enable high value use cases, links between entity descriptions need to be established. The linked data model suggests that links should be state explicitly by those who expose entity descriptions, but unlike on the normal web, incentives for doing so are unclear so that the model ultimately seems to fail in practice. In this position paper, we make the conjecture that explicit links are not needed for realizing the semantic web. We propose discuss how Record Linkage techniques are in general very well suited for the task but argue the need for a tool would allow data publishers to have an active role in producing entity descriptions that can then be linked automatically. The need for linkage, the absence of links The dream that inspired many Semantic Web researchers is that of a web where bits of information are discovered and connected automatically because they “matter” for the task at hand, possibly coming from any web location and ultimately reused well beyond the purpose for which they were originally created. Applied to news, this vision would allow a reader to get “second and third” points of views when reading about anything. Applied to commerce it would ideally eliminate the need for advertizing: sellers and suppliers would simply “be found” for the characteristics of the offer. Given no expected imminent breakthrough in the ways machine can understand content meant for human consumption, the idea of the Semantic Web initiative has been that of proposing that Web Site “lend a hand” to machines by encoding semantics using RDF. For years, however, RDF descriptions on the Web have been made available almost exclusively by web data enthusiasts, i.e. by the Semantic Web community itself. Despite this, the community has been able to made available a remarkable amounts of information, known as the Linked Open Data cloud, to the point that many Copyright © 2009, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Supported by Science Foundation Ireland under Grant No. SFI/02/CE1/I131, by the OKKAM Project (ICT-215032) entities, e.g. encyclopedic entities but also the people participating in the community, are often “described” (have metadata about then in RDF) in several dozen different independent RDF sources on the web. The existence of descriptions alone, however, it is not sufficient condition for this data to be discovered automatically. For this reason the LOD community has been advocating the reuse of URIs of other sites as a way to create interlinks. In [1], it is explained that to allow crawlers and agents to understand that a description is about something described also elsewhere, URIs from other sites should be used. For these URIs to be found, one should first manually select datasets from a maintained list of known datasets, then explore these to find suitable URIs to link to, this for each entity to be linked. It is suggested that automatic methods be used when linking multiple entities, e.g. [2] but especially in this case it is necessary to know a priori which specific dataset to link to and to perform manual configuration of the matching algorithms, something that requires a high degree of expertise. This complexity, together with the – arguably temporary lack of immediate incentives for doing this, makes it so that even among the LOD community formal data quality [5] and interlinks are scarce. A quick query on Sindice, currently indexing approximately 65 M semantic documents shows that less than 4 million RDF documents (usually entity descriptions from the LOD cloud) exhibit at least one sameAs link. In the last year however LOD is becoming no more the only source of large amounts of RDF structured content. Thanks to the support of Google and Yahoo for RDFa encoded content for advanced snippets, it is safe to say that tens of millions of pages of database generated content have appeared, none of these, to the best of our knowledge, providing interlinks among descriptions on different websites. Missing Links are here to stay We believe that the problem of missing interlinks in RDFa descriptions and of the little number of interlinks also in the Semantic Web enthusiasts community datasets is “here to stay” for many reasons. Links to the web increase the value of a site by making it “more useful” to visitors, the 1 http://sindice.com/search?q=*+%3Cowl%3AsameAs%3E+*&qt=advanced retrieved 1 Dec 2009

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Frankl's Conjecture for a subclass of semimodular lattices

 In this paper, we prove Frankl's Conjecture for an upper semimodular lattice $L$ such that $|J(L)setminus A(L)| leq 3$, where $J(L)$ and $A(L)$ are the set of join-irreducible elements and the set of atoms respectively. It is known that the class of planar lattices is contained in the class of dismantlable lattices and the class of dismantlable lattices is contained in the class of lattices ha...

متن کامل

An Ontology Design Pattern for Chess Games

We present an ontology pattern describing records of chess games. Besides being an interesting modeling problem by itself, the fact that chess is one of the most popular game in the world with hundreds of millions of active players, including several millions online players led to a huge amount of chess game data available from various online chess databases. Furthermore, these data are becomin...

متن کامل

A Lightweight Model for Publishing and Sharing Linked Web APIs

The web of Linked Data has been proposed in the last years in order to create a global data graph, that spans data sources, connected by RDF links, and enables the discovery of new resources. Recently, Web APIs have been more and more used to access documents and metadata from the web of Linked Data and to easily compose new applications called web mashups. In this paper, we describe a lightwei...

متن کامل

A note on Fouquet-Vanherpe’s question and Fulkerson conjecture

‎The excessive index of a bridgeless cubic graph $G$ is the least integer $k$‎, ‎such that $G$ can be covered by $k$ perfect matchings‎. ‎An equivalent form of Fulkerson conjecture (due to Berge) is that every bridgeless‎ ‎cubic graph has excessive index at most five‎. ‎Clearly‎, ‎Petersen graph is a cyclically 4-edge-connected snark with excessive index at least 5‎, ‎so Fouquet and Vanherpe as...

متن کامل

ارایه یک روش جدید انتشار داده‌ها با حفظ محرمانگی با هدف بهبود دقّت طبقه‌‌بندی روی داده‌های گمنام

Data collection and storage has been facilitated by the growth in electronic services, and has led to recording vast amounts of personal information in public and private organizations databases. These records often include sensitive personal information (such as income and diseases) and must be covered from others access. But in some cases, mining the data and extraction of knowledge from thes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010